Anytime optimal MDP planning with trial-based heuristic tree search
نویسنده
چکیده
Planning and acting in a dynamic environment is a challenging task for an autonomous agent, especially in the presence of uncertain and exogenous effects, a large number of states, and a long-term planning horizon. In this thesis, we approach the problem by considering algorithms that interleave planning for the current state and execution of the taken decision. The main challenge of the agent is to use its tight deliberation time wisely. One solution are determinizations, which simplify the Markov Decision Process that describes the uncertain environment to a deterministic planning problem. We introduce an all-outcomes determinization where, unlike in comparable methods, the number of deterministic actions is not exponentially but polynomially bounded in the number of parallel probabilistic effects. We discuss three algorithms that base their decision solely on the solution to a determinization, and show that they have fundamental limitations that prevent optimal behavior even if provided with unlimited resources. The main contribution of this thesis, the Trial-based Heuristic Tree Search (THTS) framework, allows the description of algorithms in terms of only six ingredients that can be mixed and matched at will. We present a selection of ingredients and analyze theoretically which combinations yield asymptotically optimal behavior. Our implementation of the THTS framework, the probabilistic planner PROST, not only allows to evaluate all anytime optimal algorithms empirically on the benchmarks of the International Probabilistic Planning Competition (IPPC), but furthermore emphasizes the potential of THTS by being the back to back winner of the competition in 2011 and 2014. In the final chapter, we introduce the MDP-Evaluation Stopping Problem, the optimization problem faced by participants of IPPC 2014. We show how it can be constructed formally, discuss three special cases that are solvable in practice, and present approximate algorithms that are based on techniques that are derived from the solutions for the special cases. Finally, we show theoretically and empirically that all proposed algorithms improve significantly over the application of the state-of-the-art approach.
منابع مشابه
A Stochastic Process Model of Classical Search
Among classical search algorithms with the same heuristic information, with sufficient memory A* is essentially as fast as possible in finding a proven optimal solution. However, in many situations optimal solutions are simply infeasible, and thus search algorithms that trade solution quality for speed are desirable. In this paper, we formalize the process of classical search as a metalevel dec...
متن کاملAn UCT Approach for Anytime Agent-Based Planning
In this paper, we introduce a new heuristic search algorithm based on mean values for anytime planning, called MHSP. It consists in associating the principles of UCT, a bandit-based algorithm which gave very good results in computer games, and especially in Computer Go, with heuristic search in order to obtain an anytime planner that provides partial plans before finding a solution plan, and fu...
متن کاملTrial-Based Heuristic Tree Search for Finite Horizon MDPs
Dynamic programming is a well-known approach for solving MDPs. In large state spaces, asynchronous versions like Real-Time Dynamic Programming (RTDP) have been applied successfully. If unfolded into equivalent trees, Monte-Carlo Tree Search algorithms are a valid alternative. UCT, the most popular representative, obtains good anytime behavior by guiding the search towards promising areas of the...
متن کاملInformed Asymptotically Optimal Anytime Search
Path planning in robotics often requires finding high-quality solutions to continuously valued and/or high-dimensional problems. These problems are challenging and most planning algorithms instead solve simplified approximations. Popular approximations include graphs and random samples, as respectively used by informed graph-based searches and anytime sampling-based planners. Informed graph-bas...
متن کاملSearch-Based Footstep Planning
Efficient footstep planning for humanoid navigation through cluttered environments is still a challenging problem. Often, obstacles create local minima in the search space, forcing heuristic planners such as A* to expand large areas. Furthermore, planning longer footstep paths often takes a long time to compute. In this work, we introduce and discuss several solutions to these problems. For nav...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015